Linear Regression¶
Simple Linear Regression¶
figure 1 : Dataset of height and corresponding weight of random people
Problem : How can we create a Machine Learning Model for data in figure 1 , that predict the weight , provided the height ?¶
On finding the best fit line we can predict the value of weight for new value of height.
Here we try to fit a simple line to this data.
The simple linear equation is:
$$\mathbf{y} = m\mathbf{x} + c $$
where, we need to estimate the parameters, y -intercept($c$) and slope($m$).
Simple Linear regression is a procedure to find the value of c and m which best fit the given data.
Let's see a simple linear regression performed on scatter plot of height Vs. weight.
figure 2 : Best fit line plotted over the data of figure 1.
(We will later go in detail regarding the procedure of finding the best fit line )
The red line is a simple linear regression line with output $\mathbf{y}$ as weight and $\mathbf{x}$ as height.
The error, $\epsilon$ is the difference between the actual value, $y_i$, and predicted value, $\hat{y_i}$.
The actual output data point, which are the blue dots , the predicted value is projection of blue dots into red regression line .
Error for each output data point is shown by the vertical distance from the actual output data point to the predicted point on a regression line.
The predicted output value is:
$$\hat{y_i} = mx_i+ c$$
The actual output value is:
$$y_i = mx_i+ c + \epsilon_i$$
Where $\epsilon_i$ is a error. The error $\epsilon_i$ as ($y_{i}-\hat{y_{i}}$) can either be positive or negative or even 0 sometimes.
We can see in the figure, error represented by vertical lines are on either side of the regression line.
We square each error and sum them, called Sum of Squared Errors.
$$\text{Sum of Squared Errors (SSE)} = \sum_{i=1}^{n}(y_{i}-\hat{y_{i}})^2$$
The summation is indexed from $1$ to $n$, since we have $n$ samples. Change in $m$ and $c$ causes change in Sum of Squared Errors.
The main principle is that we should end up choosing intercept ($c$) and slope ($m$) such that the overall sum is minimum.
Sum of Squared Errors (SSE) can also be written as:
$$\text{SSE} = \sum_{i=1}^{n}(y_{i}-\hat{y_{i}})^2 =\sum_{i=1}^{n}(y_{i}-(c+m_1x_i))^2 $$
Here, $\hat{y_i}$ is replaced with the simple linear regression model equation, i.e $\hat{y_i}$ = $m*x+c$ .
Since $\text{SSE}$ is a squared term, it is always positive. On plotting the value of $c$ ,$m$ in x and y axis and evaluating corresponding SSE in z axis ,the 3D graph would be a convex structure facing upwards.
The parameters at a minimum point are obtained from calculus using Gradient Decent . Gradient Decent is really cool concept in the AI domain till now.
Before diving into Gradient Descent , we need to have clear idea regarding partial derivative.
Partial Derivative¶
Partial derivative exist for function which have 2 or more variables.
For function (SSE) $\text{SSE} =\sum_{i=1}^{n}(y_{i}-(c+m_1x_i))^2$, there are two variables $m$ and $c$ .
There can exist the derivative of $\text{SSE}$ with respect to $m$ and $c$ .
The derivative of $\text{SSE}$ with respect to $m$ , while $c$ remaining constant is known as partial derivative of $\text{SSE}$ with respect to $m$. Denoted by : $\frac{\partial }{\partial m }\sum(y_i-(m*x_i+c))^2$
Similarly , the derivative of $\text{SSE}$ with respect to $c$ , while $m$ remaining constant is known as partial derivative of $\text{SSE}$ with respect to $c$. Denoted by : $\frac{\partial }{\partial c }\sum(y_i-(m*x_i+c))^2$
Partial derivative of SSE with respect to m is
$\frac{\partial }{\partial m }\sum(y_i-(m *x_i+c))^2$
= $\sum\frac{\partial }{\partial m}(y_i-(m *x_i+ c ))^2 $
= $ \sum2(y_i-(m *x_i + c ))(-x_i) $
Similarly , Partial derivative of SSE with respect to c is
$\frac{\partial }{\partial c }\sum(y_i-(m *x_i+c))^2$
= $\sum\frac{\partial }{\partial c}(y_i-(m *x_i+ c ))^2 $
= $ \sum2(y_i-(m *x_i + c )) $
We can visualize partial derivative in the interactive as follow :¶
On selecting $\frac{\partial }{\partial c }SSE$ view ;¶
-We get a yellow color vertical plane ( parallel to c- axis and perpendicular to m-axis ) intersecting the SSE-plot .
-One the region of intersection we can see the white curve.
-The derivative of this white curve at the specific point (red dot ) is the partial derivative of SSE with respect to c .
On selecting $\frac{\partial }{\partial m }SSE$ view ;¶
-We get a red color vertical plane ( parallel to m- axis and perpendicular to c-axis ) intersecting the SSE-plot .
-On the region of intersection we can see the white curve.
-The derivative of this white curve at the specific point (red dot ) is the partial derivative of SSE with respect to m .
Gradient Descent¶
Lets proceed into the working mechanism of gradient descent .
Initially partial derivative is calculated at random initial values of m and c as $m_0$ and $c_0$ respectively .
i.e At ($m_0$ , $c_0$)
$\frac{\partial }{\partial m } SSE$ =$ \sum2(y_i-(m_0*x_i + c_0))(-x_i)$
$\frac{\partial }{\partial c } SSE$ = $ \sum2(y_i-(m_0*x_i + c_0))$
There might exist better value of m and c which lead to lower value of SSE.
Hence , the new values of $m$ and $c$ are evaluated by :
$m_1$ = $m_0$ - learning_rate * $\frac{\partial }{\partial m }SSE(m_0, c_0)$
$c_1$ = $c_0$ - learning_rate * $\frac{\partial }{\partial c }SSE(m_0, c_0)$
Where , learning_rate is the multiplication factor , which decides how fast we need to change the value of m and c , to get into optimum value.
In this way we calculate $(m_2 , c_2) , (m_3 , c_3) , .......... and so on .$.
This operation exist untill we get no significant change at all in value of SSE on changing the value of m and c .
The procedure mentioned above is known as gradient descent.
Implementation on Real World Dataset¶
Here is the link to dataset : * https://drive.google.com/file/d/1xoZ51eaK-NfLfH_0L7UB6AtNlwba_Xdx/view?usp=sharing
This dataset has one input, height in cm, and one output, weight in kg.
Imports¶
import numpy as np
import pandas as pd
import matplotlib as mp
from matplotlib import pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
data_path = "https://drive.google.com/uc?export=download&id=1xoZ51eaK-NfLfH_0L7UB6AtNlwba_Xdx"
# we can also download the data and mention the directory within own computer as data path
# Read the CSV data from the link
data_frame = pd.read_csv(data_path)
# Printfirst 5 samples from the DataFrame
data_frame.head()
| Height | Weight | |
|---|---|---|
| 0 | 182.928810 | 97.097585 |
| 1 | 192.911362 | 91.056023 |
| 2 | 186.165803 | 87.345558 |
| 3 | 182.692991 | 89.157395 |
| 4 | 175.419288 | 85.219482 |
Training Simple Linear Regression¶
# Extract X and y
X = data_frame.iloc[:, 0].values.reshape(-1, 1) # Select the first column (X)
y = data_frame.iloc[:, 1] # Select the second column (y)
# Initialize Linear Regression model
model = LinearRegression()
# Fit the model
model.fit(X, y)
# Print trained parameters
print("Trained slope (m) :", model.coef_)
print("Trained intercept (C):", model.intercept_)
# Make predictions
y_pred = model.predict(X)
# Calculate Mean Squared Error
# Mean Squared Error = SSE / n ( n is the number of sample data )
mse = mean_squared_error(y, y_pred)
print("Mean Squared Error:", mse)
Trained slope (m) : [0.46349633] Trained intercept (C): 6.010259748648124 Mean Squared Error: 22.826601317067553
Visualization¶
# Plot the data and the linear regression line
plt.scatter(X, y, color='blue', label='Data')
plt.plot(X, y_pred, color='red', label='Linear Regression')
plt.xlabel('Height in cm ')
plt.ylabel('Weight in kg ')
plt.title('Linear Regression Fit: Height vs Weight ')
plt.legend()
plt.show()
Predict for a query .¶
predicted_value = model.predict([[175]]) # we can change the value to get the prediction
print("Predicted weight for given height is :", predicted_value, 'kg ' )
Predicted weight for given height is : [87.12211777] kg
